Embedding Service API

Embedding Generation: Generate embeddings using Triton Nomic Embedding model
RAG Support: Retrieval-Augmented Generation with Elasticsearch vector stores
Hybrid Retrieval: Query fusion retriever combining multiple data sources
LLM Reranking: Rerank retrieved documents using LLM
MongoDB Integration: Store and retrieve prompts and configurations
Health Checks: Monitor service status and component availability
Async Support: Asynchronous operations for better performance

Architecture

embedding-service/
├── app/
│   ├── app.py                      # Application entry point
│   ├── single_index.py             # Main FastAPI application
│   ├── triton_nomic_embedding.py   # Triton embedding client
│   └── main.py                     # (unused)
├── mongo/
│   └── mongodbservice.py           # MongoDB service and repositories
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment variables template
└── README.md                       # This file

Installation

Clone the repository

cd /Users/vishan/PycharmProjects/Embedding-Service

Create a virtual environment

python -m venv .venv
source .venv/bin/activate  # On macOS/Linux

Install dependencies
```
pip install -r requirements.txt
```

Configure environment variables

cp .env.example .env
# Edit .env with your configuration

Configuration

Edit the .env file with your settings:

# Server
PORT=8000
HOST=0.0.0.0

# LLM Configuration
LLM_MODEL_NAME=llama-2-13b-chat
LLM_HOST=http://localhost:8000/v1
LLM_API_KEY=your-api-key

# Embedding Model
EMBEDDING_MODEL_NAME=nomic-ai_nomic-embed-text-v1.5-ensemble
EMBEDDING_API_BASE=http://localhost:8000

# MongoDB
MONGO_HOST=localhost
MONGO_USERNAME=admin
MONGO_PASSWORD=password

# Elasticsearch
ES_HOST=localhost
ES_USER=elastic
ES_PASSWORD=password
SECURITY_REPORT_INDEX_NAME=security_reports
CVE_INDEX_NAME=cve_data

# Retrieval Settings
TOP_K_AFTER_RERANK=5
SIMILARITY_TOP_K=10

Running the Service

Method 1: Using app.py (Recommended)

cd app
python app.py

Method 2: Using uvicorn directly

cd app
uvicorn single_index:app --host 0.0.0.0 --port 8000

Method 3: Using the module's main

cd app
python single_index.py

API Endpoints

1. Root Endpoint

GET /

Returns service information and available endpoints.

Response:

{
  "service": "Embedding Service API",
  "version": "1.0.0",
  "status": "running",
  "endpoints": {
    "health": "/health",
    "embeddings": "/v1/embeddings",
    "prompt": "/v1/prompt",
    "retrieve": "/v1/retrieve"
  }
}

2. Health Check

GET /health

Check service health and component status.

Response:

{
  "status": "healthy",
  "embedding_model": true,
  "vector_stores": true,
  "mongodb": true
}

3. Generate Embeddings

POST /v1/embeddings

Generate embeddings for provided texts.

Request:

{
  "texts": ["Hello world", "How are you?"]
}

Response:

{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  "model": "nomic-ai_nomic-embed-text-v1.5-ensemble",
  "dimensions": 768
}

4. Retrieve Documents

POST /v1/retrieve

Retrieve relevant documents without generating a response.

Request:

{
  "query": "What are the security vulnerabilities?",
  "summary": "Optional summary"
}

Response:

{
  "query": "What are the security vulnerabilities?",
  "documents": [
    {
      "page": 1,
      "file_path": "/path/to/file.pdf",
      "file_name": "security_report.pdf",
      "score": 0.95,
      "text": "Document text...",
      "type": "pdf",
      "others": {}
    }
  ],
  "count": 1,
  "has_context": true
}

5. RAG Prompt Generation

POST /v1/prompt

Generate a RAG-enhanced prompt with retrieved context.

Request:

{
  "query": "Explain the CVE-2023-1234",
  "summary": "Optional summary"
}

Response:

{
  "response": "Context 1: ...\nContext 2: ...",
  "metadata_list": [...],
  "prompt": "Formatted prompt with context",
  "system_message": "System prompt",
  "has_context": true,
  "retrievers_list": ["security_reports", "cve_store"]
}

Testing the API

Using curl

# Health check
curl http://localhost:8000/health

# Generate embeddings
curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"texts": ["Hello world", "Test embedding"]}'

# Retrieve documents
curl -X POST http://localhost:8000/v1/retrieve \
  -H "Content-Type: application/json" \
  -d '{"query": "security vulnerabilities"}'

Using Python

import requests

# Generate embeddings
response = requests.post(
    "http://localhost:8000/v1/embeddings",
    json={"texts": ["Hello world", "Test embedding"]}
)
print(response.json())

# Retrieve documents
response = requests.post(
    "http://localhost:8000/v1/retrieve",
    json={"query": "security vulnerabilities"}
)
print(response.json())

Components

Triton Nomic Embedding

Connects to Triton Inference Server
Supports batch processing
Handles base64 encoding for text inputs
Applies L2 normalization and mean pooling

Vector Stores

Elasticsearch: Primary vector store for document retrieval
Hybrid Retrieval: Combines multiple retrievers using reciprocal rank fusion
LLM Reranking: Uses LLM to rerank retrieved documents

MongoDB

Stores prompts and configurations
Singleton pattern for connection pooling
Automatic reconnection handling

Error Handling

The service provides graceful degradation:

If Elasticsearch is unavailable, embeddings-only mode is enabled
If MongoDB is unavailable, uses default prompts
All endpoints return proper HTTP status codes and error messages

Development

Running in Development Mode

cd app
uvicorn single_index:app --reload --host 0.0.0.0 --port 8000

Checking for Errors

# Check Python syntax
python -m py_compile app/single_index.py

# Run with debug logging
LOG_LEVEL=DEBUG python app/app.py

Deployment

Using systemd (Linux)

Create /etc/systemd/system/embedding-service.service:

[Unit]
Description=Embedding Service API
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/Embedding-Service/app
Environment="PATH=/path/to/.venv/bin"
ExecStart=/path/to/.venv/bin/python app.py
Restart=always

[Install]
WantedBy=multi-user.target

Then:

sudo systemctl daemon-reload
sudo systemctl enable embedding-service
sudo systemctl start embedding-service

Troubleshooting

Import Errors

If you encounter import errors with MongoDB:

# The service uses sys.path.append to handle imports
# Make sure you're running from the correct directory

Connection Issues

Verify Triton server is running and accessible
Check Elasticsearch cluster status
Verify MongoDB connection string

Performance

Adjust max_batch_size in TritonNomicEmbedding for better throughput
Tune SIMILARITY_TOP_K and TOP_K_AFTER_RERANK for retrieval quality
Use connection pooling for MongoDB (already configured)

Embedding Service API

A comprehensive FastAPI-based embedding service with RAG (Retrieval-Augmented Generation) capabilities, supporting Triton inference server, Elasticsearch vector stores, and MongoDB for prompt management.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Architecture

Installation

Configuration

Running the Service

Method 1: Using app.py (Recommended)

Method 2: Using uvicorn directly

Method 3: Using the module's main

API Endpoints

1. Root Endpoint

2. Health Check

3. Generate Embeddings

4. Retrieve Documents

5. RAG Prompt Generation

Testing the API

Using curl

Using Python

Components

Triton Nomic Embedding

Vector Stores

MongoDB

Error Handling

Development

Running in Development Mode

Checking for Errors

Deployment

Using systemd (Linux)

Troubleshooting

Import Errors

Connection Issues

Performance

Embedding Service API

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages